AITopics | Richland County

Collaborating Authors

Richland County

TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator

Vungarala, Deepak, Elbtity, Mohammed E., Syed, Sumiya, Alam, Sakila, Pandit, Kartik, Ghosh, Arnob, Zand, Ramtin, Angizi, Shaahin

arXiv.org Artificial IntelligenceMar-7-2025

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scare hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92\% and 96\% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.05951

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Multiple Transferable Neural Network Method with Domain Decomposition for Elliptic Interface Problems

Lu, Tianzheng, Ju, Lili, Zhu, Liyong

arXiv.org Artificial IntelligenceFeb-27-2025

The transferable neural network (TransNet) is a two-layer shallow neural network with pre-determined and uniformly distributed neurons in the hidden layer, and the least-squares solvers can be particularly used to compute the parameters of its output layer when applied to the solution of partial differential equations. In this paper, we integrate the TransNet technique with the nonoverlapping domain decomposition and the interface conditions to develop a novel multiple transferable neural network (Multi-TransNet) method for solving elliptic interface problems, which typically contain discontinuities in both solutions and their derivatives across interfaces. We first propose an empirical formula for the TransNet to characterize the relationship between the radius of the domain-covering ball, the number of hidden-layer neurons, and the optimal neuron shape. In the Multi-TransNet method, we assign each subdomain one distinct TransNet with an adaptively determined number of hidden-layer neurons to maintain the globally uniform neuron distribution across the entire computational domain, and then unite all the subdomain TransNets together by incorporating the interface condition terms into the loss function. The empirical formula is also extended to the Multi-TransNet and further employed to estimate appropriate neuron shapes for the subdomain TransNets, greatly reducing the parameter tuning cost. Additionally, we propose a normalization approach to adaptively select the weighting parameters for the terms in the loss function. Ablation studies and extensive experiments with comparison tests on different types of elliptic interface problems with low to high contrast diffusion coefficients in two and three dimensions are carried out to numerically demonstrate the superior accuracy, efficiency, and robustness of the proposed Multi-TransNet method.

artificial intelligence, hidden-layer neuron, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.19893

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Analog and Multi-modal Manufacturing Datasets Acquired on the Future Factories Platform V2

Harik, Ramy, Kalach, Fadi El, Samaha, Jad, Samaha, Philip, Clark, Devon, Sander, Drew, Burns, Liam, Yousif, Ibrahim, Gadow, Victor, Mahmoud, Ahmed, Wuest, Thorsten

arXiv.org Artificial IntelligenceFeb-7-2025

This paper presents two industry-grade datasets captured during an 8-hour continuous operation of the manufacturing assembly line at the Future Factories Lab, University of South Carolina, on 08/13/2024. The datasets adhere to industry standards, covering communication protocols, actuators, control mechanisms, transducers, sensors, and cameras. Data collection utilized both integrated and external sensors throughout the laboratory, including sensors embedded within the actuators and externally installed devices. Additionally, high-performance cameras captured key aspects of the operation. In a prior experiment [1], a 30-hour continuous run was conducted, during which all anomalies were documented. Maintenance procedures were subsequently implemented to reduce potential errors and operational disruptions. The two datasets include: (1) a time-series analog dataset, and (2) a multi-modal time-series dataset containing synchronized system data and images. These datasets aim to support future research in advancing manufacturing processes by providing a platform for testing novel algorithms without the need to recreate physical manufacturing environments. Moreover, the datasets are open-source and designed to facilitate the training of artificial intelligence models, streamlining research by offering comprehensive, ready-to-use resources for various applications and projects.

artificial intelligence, dataset, degree float, (13 more...)

arXiv.org Artificial Intelligence

2502.0502

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning

Ghofrani, Fatemeh, Jamshidi, Pooyan

arXiv.org Artificial IntelligenceJan-6-2025

Self-supervised learning (SSL) has significantly advanced image representation learning, yet efficiency challenges persist, particularly with adversarial training. Many SSL methods require extensive epochs to achieve convergence, a demand further amplified in adversarial settings. To address this inefficiency, we revisit the robust EMP-SSL framework, emphasizing the importance of increasing the number of crops per image to accelerate learning. Unlike traditional contrastive learning, robust EMP-SSL leverages multi-crop sampling, integrates an invariance term and regularization, and reduces training epochs, enhancing time efficiency. Evaluated with both standard linear classifiers and multi-patch embedding aggregation, robust EMP-SSL provides new insights into SSL evaluation strategies. Our results show that robust crop-based EMP-SSL not only accelerates convergence but also achieves a superior balance between clean accuracy and adversarial robustness, outperforming multi-crop embedding aggregation. Additionally, we extend this approach with free adversarial training in Multi-Crop SSL, introducing the Cost-Free Adversarial Multi-Crop Self-Supervised Learning (CF-AMC-SSL) method. CF-AMC-SSL demonstrates the effectiveness of free adversarial training in reducing training time while simultaneously improving clean accuracy and adversarial robustness. These findings underscore the potential of CF-AMC-SSL for practical SSL applications. Our code is publicly available at https://github.com/softsys4ai/CF-AMC-SSL.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.03507

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs

Kabir, Ehsan, Downey, Austin R. J., Bakos, Jason D., Andrews, David, Huang, Miaoqing

arXiv.org Artificial IntelligenceNov-27-2024

Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.18148

Country:

Europe (1.00)
North America > United States > Arkansas > Washington County > Fayetteville (0.14)
North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Semiconductors & Electronics (0.88)
Information Technology (0.68)
Media > Television (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Knowledge Graphs of Driving Scenes to Empower the Emerging Capabilities of Neurosymbolic AI

Wickramarachchi, Ruwan, Henson, Cory, Sheth, Amit

arXiv.org Artificial IntelligenceNov-7-2024

In the era of Generative AI, Neurosymbolic AI is emerging as a powerful approach for tasks spanning from perception to cognition. The use of Neurosymbolic AI has been shown to achieve enhanced capabilities, including improved grounding, alignment, explainability, and reliability. However, due to its nascent stage, there is a lack of widely available real-world benchmark datasets tailored to Neurosymbolic AI tasks. To address this gap and support the evaluation of current and future methods, we introduce DSceneKG -- a suite of knowledge graphs of driving scenes built from real-world, high-quality scenes from multiple open autonomous driving datasets. In this article, we detail the construction process of DSceneKG and highlight its application in seven different tasks. DSceneKG is publicly accessible at: https://github.com/ruwantw/DSceneKG

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.03225

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Elbtity, Mohammed, Chandarana, Peyton, Zand, Ramtin

arXiv.org Artificial IntelligenceJul-11-2024

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML accelerators, like graphical processing units (GPUs), being designed specifically to perform the multiply-accumulate (MAC) operations required in the matrix-matrix and matrix-vector multiplies extensively present throughout the execution of deep neural networks (DNNs). Such improvements include maximizing data reuse and minimizing data transfer by leveraging the temporal dataflow paradigms provided by the systolic array architecture. While this design provides a significant performance benefit, the current implementations are restricted to a single dataflow consisting of either input, output, or weight stationary architectures. This can limit the achievable performance of DNN inference and reduce the utilization of compute units. Therefore, the work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. Our experiments thoroughly test the viability of the Flex-TPU comparing it to conventional TPU designs across multiple well-known ML workloads. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.087

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Can GPT-4 Help Detect Quit Vaping Intentions? An Exploration of Automatic Data Annotation Approach

Vuruma, Sai Krishna Revanth, Wu, Dezhi, Gupta, Saborny Sen, Aust, Lucas, Lookingbill, Valerie, Bellamy, Wyatt, Ren, Yang, Kasson, Erin, Chen, Li-Shiun, Cavazos-Rehg, Patricia, Hu, Dian, Huang, Ming

arXiv.org Artificial IntelligenceJun-28-2024

In recent years, the United States has witnessed a significant surge in the popularity of vaping or e-cigarette use, leading to a notable rise in cases of e-cigarette and vaping use-associated lung injury (EVALI) that caused hospitalizations and fatalities during the EVALI outbreak in 2019, highlighting the urgency to comprehend vaping behaviors and develop effective strategies for cessation. Due to the ubiquity of social media platforms, over 4.7 billion users worldwide use them for connectivity, communications, news, and entertainment with a significant portion of the discourse related to health, thereby establishing social media data as an invaluable organic data resource for public health research. In this study, we extracted a sample dataset from one vaping sub-community on Reddit to analyze users' quit-vaping intentions. Leveraging OpenAI's latest large language model GPT-4 for sentence-level quit vaping intention detection, this study compares the outcomes of this model against layman and clinical expert annotations. Using different prompting strategies such as zero-shot, one-shot, few-shot and chain-of-thought prompting, we developed 8 prompts with varying levels of detail to explain the task to GPT-4 and also evaluated the performance of the strategies against each other. These preliminary findings emphasize the potential of GPT-4 in social media data analysis, especially in identifying users' subtle intentions that may elude human detection.

annotator, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.00167

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Public Health (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes

Nagpal, Vansh, Valluru, Siva Likitha, Lakkaraju, Kausik, Srivastava, Biplav

arXiv.org Artificial IntelligenceJun-19-2024

A common, yet regular, decision made by people, whether healthy or with any health condition, is to decide what to have in meals like breakfast, lunch, and dinner, consisting of a combination of foods for appetizer, main course, side dishes, desserts, and beverages. However, often this decision is seen as a trade-off between nutritious choices (e.g., low salt and sugar) or convenience (e.g., inexpensive, fast to prepare/obtain, taste better). In this preliminary work, we present a data-driven approach for the novel meal recommendation problem that can explore and balance choices for both considerations while also reasoning about a food's constituents and cooking process. Beyond the problem formulation, our contributions also include a goodness measure, a recipe conversion method from text to the recently introduced multimodal rich recipe representation (R3) format, and learning methods using contextual bandits that show promising results.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2406.13714

Country: North America > United States > South Carolina > Richland County > Columbia (0.15)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Consumer Health (1.00)
Consumer Products & Services (0.96)
Health & Medicine > Therapeutic Area > Endocrinology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Objective Neural Architecture Search for In-Memory Computing

Amin, Md Hasibul, Mohammadi, Mohammadreza, Zand, Ramtin

arXiv.org Artificial IntelligenceJun-10-2024

In this work, we employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing (IMC) architectures. Initially, we design three fundamental components inspired by the convolutional layers found in VGG and ResNet models. Subsequently, we utilize Bayesian optimization to construct a convolutional neural network (CNN) model with adaptable depths, employing these components. Through the Bayesian search algorithm, we explore a vast search space comprising over 640 million network configurations to identify the optimal solution, considering various multi-objective cost functions like accuracy/latency and accuracy/energy. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets, demonstrating the effectiveness of our method in achieving a balanced solution characterized by high accuracy and reduced latency and energy consumption.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.06746

Country: North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback